Hierarchical Average Reward Reinforcement Learning
نویسندگان
چکیده
Hierarchical reinforcement learning (HRL) is the study of mechanisms for exploiting the structure of tasks in order to learn more quickly. By decomposing tasks into subtasks, fully or partially specified subtask solutions can be reused in solving tasks at higher levels of abstraction. The theory of semi-Markov decision processes provides a theoretical basis for HRL. Several variant representational schemes based on SMDP models have been studied in previous work, all of which are based on the discrete-time discounted SMDP model. In this approach, policies are learned that maximize the long-term discounted sum of rewards. In this paper we investigate two formulations of HRL based on the average-reward SMDP model, both for discrete time and continuous time. In the average-reward model, policies are sought that maximize the expected reward per step. The two formulations correspond to two different notions of optimality that have been explored in previous work on HRL: hierarchical optimality, which corresponds to the set of optimal policies in the space defined by a task hierarchy, and a weaker local model called recursive optimality. What distinguishes the two models in the average reward framework is the optimization of subtasks. In the recursively optimal framework, subtasks are treated as continuing, and solved by finding gain optimal policies given the policies of their children. In the hierarchical optimality framework, the aim is to find a globally gain optimal policy within the space of policies defined by the hierarchical decomposition. We present algorithms that learn to find recursively and hierarchically optimal policies under discrete-time and continuous-time average reward SMDP models. We use four experimental testbeds to study the empirical performance of our proposed algorithms. The first two domains are relatively simple, and include a small autonomous guided vehicle (AGV) scheduling problem and a modified version of the well-known Taxi problem. The other two domains are larger real-world single-agent and multiagent AGV scheduling problems. We model these AGV scheduling tasks using both discrete-time and continuous-time models and compare the performance of our proposed algorithms with each other, as well as with other HRL methods and to standard Q-learning. In the large AGV domain, we also show that our proposed algorithms outperform widely used industrial heuristics, such as “first come first serve”, “highest queue first” and “nearest station first”.
منابع مشابه
Continuous-Time Hierarchical Reinforcement Learning
Hierarchical reinforcement learning (RL) is a general framework which studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work in hierarchical RL, such as the MAXQ method, has been limited to the discrete-time discounted reward semiMarkov decision process (SMDP) model. This paper generalizes the MAXQ method to continuous-time discounte...
متن کاملHierarchical Functional Concepts for Knowledge Transfer among Reinforcement Learning Agents
This article introduces the notions of functional space and concept as a way of knowledge representation and abstraction for Reinforcement Learning agents. These definitions are used as a tool of knowledge transfer among agents. The agents are assumed to be heterogeneous; they have different state spaces but share a same dynamic, reward and action space. In other words, the agents are assumed t...
متن کاملHierarchically Optimal Average Reward Reinforcement Learning
Two notions of optimality have been explored in previous work on hierarchical reinforcement learning (HRL): hierarchical optimality, or the optimal policy in the space defined by a task hierarchy, and a weaker local model called recursive optimality. In this paper, we introduce two new average-reward HRL algorithms for finding hierarchically optimal policies. We compare them to our previously r...
متن کاملExtending Hierarchical Reinforcement Learning to Continuous-Time, Average-Reward, and Multi-Agent Models
Hierarchical reinforcement learning (HRL) is a general framework that studies how to exploit the structure of actions and tasks to accelerate policy learning in large domains. Prior work on HRL has been limited to the discrete-time discounted reward semi-Markov decision process (SMDP) model. In this paper we generalize the setting of HRL to averagereward, continuous-time and multi-agent SMDP mo...
متن کاملHierarchical Reinforcement Learning Based Self-balancing Algorithm for Two-wheeled Robots
Abstract: Self-balancing control is the basis for applications of two-wheeled robots. In order to improve the self-balancing of twowheeled robots, we propose a hierarchical reinforcement learning algorithm for controlling the balance of two-wheeled robots. After describing the subgoals of hierarchical reinforcement learning, we extract features for subgoals, define a feature value vector and it...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- Journal of Machine Learning Research
دوره 8 شماره
صفحات -
تاریخ انتشار 2007